Skip to content

Bump dflash-mlx from 0.1.0 to 0.1.7#79

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.7
Open

Bump dflash-mlx from 0.1.0 to 0.1.7#79
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.7

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 23, 2026

Bumps dflash-mlx from 0.1.0 to 0.1.7.

Release notes

Sourced from dflash-mlx's releases.

dflash-mlx v0.1.7

Big adaptive-runtime update focused mostly on Qwen3.6 27B 4-bit and real long-context usage.

The main change is the adaptive verify policy.

DFlash normally drafts a block and asks the target to verify it. Large verify blocks, like M=16, are great when acceptance is strong because one target pass can commit many tokens. But when the draft gets weaker, large blocks waste target work on suffix tokens that will be rejected anyway.

v0.1.7 makes that decision dynamic:

  • Start from the normal large-block path.
  • Watch recent acceptance, tokens per cycle, and real cycle wall cost.
  • If the large block stops paying off, drop to a smaller M=4 verify block.
  • Stay there for a burst while acceptance stabilizes.
  • Periodically probe back up to the large block.
  • Resume M=16 only when the probe is actually better than the reduced path.

So the goal is not just “higher acceptance”. The goal is committed tokens per real unit of time. M=4 can be better when acceptance is low; M=16 can be better when the draft is stable. The runtime now measures that instead of forcing one block size everywhere.

Highlights:

  • retuned adaptive verify for long-context / agentic decode
  • richer metrics: tokens/cycle, adaptive block state, per-mode/per-block speed, CopySpec counters
  • /metrics now exposes real decode average tok/s plus logical / real / restored prefill rates
  • AIME25 benchmark suite with exact integer scoring
  • Qwen thinking default now follows tokenizer/request behavior instead of forcing thinking off
  • GDN recurrent exactness fixes around state dtype in gated-delta tape/tree kernels
  • public README benchmark artifacts for Qwen3.6 27B 4-bit at 1k / 2k / 4k / 8k / 16k

Measured README prompt, Qwen3.6 27B 4-bit, M5 Max, stock mlx_lm baseline, repeat=3, cooldown=120s, no EOS:

  • 1024: baseline 33.26 tok/s, DFlash 98.05 tok/s, 2.95x
  • 2048: baseline 32.34 tok/s, DFlash 90.67 tok/s, 2.81x
  • 4096: baseline 30.58 tok/s, DFlash 93.55 tok/s, 3.06x
  • 8192: baseline 26.03 tok/s, DFlash 79.12 tok/s, 3.04x
  • 16384: baseline 21.50 tok/s, DFlash 60.77 tok/s, 2.78x

This release is mostly about making DFlash more usable and observable in real runs, especially long-context coding/agentic workloads.

dflash-mlx v0.1.6

Large runtime, server, and agentic-workflow release since v0.1.5, including the v0.1.5.1 fixes.

Highlights

  • Reworked runtime ownership around typed runtime config, RuntimeBundle, ServerRuntime, target adapters, draft loading, cache management, and observability.
  • Default verify policy is now adaptive; fixed DFlash verification is available as --verify-mode dflash.
  • Added explicit verify modes: adaptive, dflash, ddtree, and off.
  • Added DDTree branch verification mode for Qwen target paths.
  • Added internal CopySpec candidate reuse for repeated-token continuation from prompt/generated history.
  • Added target-owned Qwen and Gemma4 backend routing, with unknown model families failing closed instead of falling into generic logic.

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dflash-mlx](https://github.com/bstnxbt/dflash-mlx) from 0.1.0 to 0.1.7.
- [Release notes](https://github.com/bstnxbt/dflash-mlx/releases)
- [Commits](https://github.com/bstnxbt/dflash-mlx/commits/v0.1.7)

---
updated-dependencies:
- dependency-name: dflash-mlx
  dependency-version: 0.1.7
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 23, 2026
@dependabot dependabot Bot requested a review from youssofal as a code owner May 23, 2026 10:33
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants